Method and device for selecting a sound algorithm

ABSTRACT

The invention relates to a method for selecting a sound algorithm for processing an audio signal. The audio signal is analyzed and the type of audio signal is ascertained based on the analysis. The audio signal is classified as a music signal or another signal, and different sound algorithms are used for the further processing and subsequent output of the audio signal.

The invention concerns a method and a device for the selection of asound algorithm for the processing of audio signals according to thecharacteristics of the main clause of Claims 1 and 28.

Modem high-fi equipment is provided with various sound programs whichpermit distribution of stereophonic audio signals to more than only twoloudspeakers or to produce surround sound in some other way. Thus, forexample, after decoding of the audio signals, these are split into fiveindividual audio channels and are used through the so-called“virtualizer” for reproduction via only two loudspeakers. Special“virtualizers” are also known which convert audio signals forreproduction specifically through earphones.

One of the best known methods for this is the so-called “Dolby ProLogic” method which, in the case of film material, is essentially usedto be able to influence the localization of the sound. Thus, speakersare usually imaged on the center channel and the noises can be comeexclusively from the back loudspeakers.

Furthermore, there is a whole class of methods which are used forsimulation of acoustics. Frequently, applicable names of such methodsare “echo”, “stadium”, “jazz”, “club”, etc. In this method, optimizedfor music signals, it is not desirable to take speech signals (singing)only from the center loudspeaker, or to emit a music signal only fromthe back loudspeakers which is possible when using the “Dolby Pro Logic”method.

In the successor of Dolby Pro Logic, which is called Dolby Pro Logic II,apart from the film mode, a mode for music is provided, which takesthese differences into consideration.

A method is known for coding of speech from EP 0 481 374 B1. Here, adiscrete transformation of a speech window is performed in order toobtain a discrete spectrum of coefficients. An approximate envelope ofthe discrete spectrum will be calculated in each of a large number ofsub-bands and used for the digital coding of the defined envelope ofeach sub-band. Within sub-bands, each scaled coefficient is recalculatedinto a number of bits, with at least one of a multiple number ofquantizers of different bit lengths. The quantizer used for eachsub-band is determined for each speech window by calculation of theassignment of bits as a number of bits greater than or equal to zero, asa function of a power density evaluation for the sub-band and adistortion error evaluation for the speech window.

From EP 0 587 733 B1, a signal analysis system is known for filtering ofan input sample value representing one or several signals. Input buffermeans are provided for grouping the input samples into time-range/signalsample blocks. The input sample values are analysis-window-weightedsamples. In addition, analysis means are present for producing spectralinformation as response to the time-range/signal sample value blocks,where the spectral information contains spectral coefficients, whichused essentially in an even-numbered stack oftime-range/aliasing-removal transformation, corresponds to time-rangesignal sample value blocks. The spectral coefficients are essentiallycoefficients of a modified discrete cosine transformation orcoefficients or coefficients of a modified discrete sine transformation.The analysis means include forward pre-transformation means to producemodified sample value blocks and forward pre-transformation means toproduce frequency range transformation coefficients.

From EP 0 664 943 B1, a coding device is known for adaptive processingof audio signals for coding, transfer, or storage and recovery, wherethe noise level fluctuates with the signal amplitude level. A processingdevice is present which responds to input signals in such a way that itemits either a first and second signal or the sum and difference of thefirst and second signals. The first and second signals correspond to thetwo matrix-coded audio signals of a four by two audio signal matrix,where the processing device also produces a control signal, which showsif the first and second signal or the sum and difference of the firstand second signal is emitted.

A decoder is known from EP 0 519 055 B1, consisting of a receiving meansfor receiving a multiplicity of information formatted by deliverychannels, deformation means for producing, in response to the receivingmeans, a deformatted representation depending on each delivery channel,and synthesis means for producing output signals depending on thedeformatted representations. A divider means is arranged between thedeformatting means and the synthesis means, which respond to thedeformatting means and produce one or several intermediate signals,where at least one intermediate signal is produced by combination of theinformation from two or more deformatted representations. The synthesismeans produce a particular output signal as response to each of theintermediate signals.

From EP 0 520 068 B1, a coder is known for coding two or more audiochannels. The coder has a sub-band device for producing sub-bandsignals, a mixing device for creating one or several composed signals,and means for producing control information for a correspondinglycomposed signal. In addition, the coder has a coding device forproducing coded information by allocating bits to one or severalcomposed signals. Furthermore, a formatting device is present forcombining the coded information and the control information into anoutput signal.

A speech coder is known from EP 0 208 712 B1. This speech coder containsa Fourier transform device for performing a discrete Fouriertransformation of an incoming speech signal to produce a discretetransformation spectrum of coefficients, a standardization device formodifying the transformation spectrum to produce a scaled, flatterspectrum and to code a function through which the discrete spectrum ismodified. In addition, a device is present for coding at least a part ofthe spectrum. The standardization device has a device (44) for definingthe approximated envelope of the discrete spectrum in each of severalsub-bands of coefficients and for coding the defined envelope of eachsub-band of coefficients, as well as devices for scaling each spectrumcoefficient relative to the defined envelope of the respective sub-bandof coefficients.

However, in each of the known inventions it is a disadvantage that theselection of a sound algorithm must be adjusted manually. For example,if a television tone of an actually chosen television channel isprocessed through a Dolby Pro Logic II decoder and the televisionchannel is switched several times between music stations and films ornews, then upon each change one must manually switch between theindividual audio sound algorithms which process the audio data, forexample, between music mode and film mode.

The task of the invention is to provide a method and a device whichassigns a sound algorithm automatically to an audio signal. The presentinvention solves this task by the characteristics of Claims 1 and 28.Advantageous embodiments and further developments of the invention aregiven in the dependent claims, in the corresponding specification and inthe figures.

The present invention solves the task by the fact that the nature of theaudio signal is recognized, and, based on the recognition of the natureof the audio signal, an automatic setting of the sound algorithm will beassigned.

In order to recognize the nature of the audio signal, differentquantities are defined and evaluated.

As the first quantity, it is determined which dynamics are actuallypresent in the audio signal. The determination of the dynamics isperformed as follows. The sample values of the left and right audiochannel are squared, added and the resulting signal is filtered througha low-pass filter. Advantageously, the low-pass filter has a limitfrequency of about 3 Hz. Over a defined time period, advantageously, forexample, five seconds, the minimum and the maximum of the audio signalare determined in this time frame. The actually present dynamic range indecibels then corresponds to ten times the difference of the logarithmsof the two values.

In another advantageous embodiment of the invention, the dynamics of theleft and right audio channel are calculated separately. During furtherconsideration, only the audio channel with the larger dynamic range isused further.

There is also the possibility that, instead of squaring, an absolutevalue is formed and instead of low-pass filtering with subsequent searchfor a maximum, a level determination is carried out for short timedurations, for example, over a period of a third of a second and then amaximum and minimum among these level values are used for thecalculation of the dynamics.

In the case of film material there are large jumps in level and thus agreater dynamic range is present, since, for example, the signal levelfalls greatly during pauses in speech. However, music signals usuallyhave a dynamic range of about 20 dB or less. A corresponding quantitycan be obtained in a surprisingly simple manner by comparing thedetermined dynamic range with a threshold value.

If the dynamic range is greater than the threshold value then thequantity is set to the value −1 (film mode), otherwise to the value 1(music mode). Instead of this rigid division, a sliding quantity will bedetermined below. For this purpose, the dynamic range is mapped througha function onto the value range [−1.0 . . . 1.0]. For this purpose, asimple function is to deduct the calculated dynamic range from thethreshold value, to divide the result by the threshold value, and thenlimit this value to the value range [−1.0 . . . 1.0]. This value will bedesignated as M1 below. If the dynamic range should be 0, then M1 iscalculated to be 1, in the case of a dynamic range corresponding to thethreshold value, M1 is calculated to be 0, which is also to be evaluatedas neutral, and in the case of dynamic ranges greater than or equal totwice the threshold value, M1 is calculated to be −1.0.

In order to avoid the response of this quantity in case of long signalpauses, a minimum level is assumed, which lies for example 30 dB belowthe maximum value which has occurred by a certain time span earlier, inan advantageous embodiment, approximately 5 minutes earlier. The maximumvalue found during the determination of the dynamics is used ascomparison level. Should this value be below the minimum level, then thequantity M1 calculated from the dynamic range is set to −1.0. For asliding cross-fading, the value range of 40 dB below the maximum levelto 20 dB below the maximum level can be used. In the case of values morethan 40 dB below the maximum level, M1 is set to −1, and in the case ofvalues of less than 20 dB below the maximum level, it remains unchanged;at values in-between, a linear interpolation is performedcorrespondingly between these two limiting cases.

As another quantity, the periodicity of the audio signal is used, whichwill be designated below as M2. Many methods are known from the standardliterature for the determination of the periodicity of an audio signal.A very simple method consists in squaring the sample values of the leftand right channel, adding them and filtering the resulting signalthrough a low-pass filter with a limit frequency of about 50 Hz. Themaxima are searched then in this signal. If it is found that the levelmaxima occur periodically at distances in time typical for music, whichis between one third to a whole second, then this quantity, M2, is setto 1, otherwise it is set to −1.

Music signals can also be identified as such based on their spectralcurves. Thus, for example, wind and string instruments have verycharacteristic spectra which can be detected easily. If such spectralcurves are detected, then a quantity M3 is set to 1, otherwise it is setto 0. The value −1 is not used here, since the nonpresence of thesespectra does not automatically mean that there is no music signalpresent. Thus, this quantity can also act in the direction of decidingthat music is detected.

Unknown instruments can also be identified in the spectrum when severaltones are played, that is, when simultaneously more than one tone can bedetected. In this case, the spectrum typical for the instrument will bepresent multiply at different frequencies. Confusion with speech is notpossible, since the spectra of different speakers are different, and oneperson can speak only at one tone level at any time. When such spectralconstellations are detected, a quantity M4 is set to the value 1,otherwise, as indicated before for the quantity M3, it is set to thevalue of 0. An even more accurate conclusion is made possible by thefact that the frequencies of these tones can be compared. If we aredealing with music, then these are very probably in a musicalrelationship with one another, that is, they differ only by a factorwhich corresponds to the integer power of the twelfth root of 2. If suchtones are detected, then music is detected even with the aid ofrecognition of melodies, that is, based on the observation of toneheights of this instrument as a function of time.

Since, in the case of music signals, usually several instruments areplaying, which are tuned to each other by their frequency behavior, sothat they mutually complement and not cover one another, in the case ofmusic signals a relatively flat frequency curve is observed. Theflatness of the frequency curve is also used as a measure for thepresence of music. For this purpose, the level of the input signal,especially the sum of the right and left audio channels is determined indifferent frequency bands, especially in the frequency bands from 20 Hzto 200 Hz, from 200 Hz to 2 kHz and from 2 kHz to 20 kHz. The maximumlevel is determined for each of these, and this value is multiplied withthe number of bands. Then the levels of the individual bands aresubtracted from this. If a large value is obtained in this way, itindicates that the power is concentrated spectrally in few bands, andthus we are probably not dealing with music. In order to find thisquantity, which is designated as M5 below, a value range from a maximumvalue to a minimum value is mapped linearly on the value range [−1.0 . .. 1.0]. Values outside this range are mapped on the limiting values.

A similar quantity can be derived from the number of spectral maximawith a certain minimum level. If many instruments are present, many suchmaxima are found. The number of maxima present can be mapped directlylinearly onto the value range [−1.0 . . . 1.0] for the determination ofanother quantity, M6.

Apart from the analysis of the sound material, the source can alsopermit conclusions regarding the sound material. Thus, for example, whenreproducing the transmission from a radio station or from a CD, theprobability is very high that we are dealing with music signals. On theother hand, the reproduction of an AC3 coded DVD would rather be a film.Each source is thus assigned an individual quantity, thus, for example,the source CD is designated by the quantity 0.5 and a DVD with the value−0.3. This quantity is called M7.

A total quantity MG is determined from the individual quantities M1 toM7. For this purpose, all quantities M1 to M7 are weighted with anindividual factor and added. Since M1 is of very great importance, it isweighted with the largest factor in comparison to the other quantitiesM2 to M7. In the further description of the invention, the quantity M1is weighted with the factor 1, M2 with the factor 0.5, M3, M4, M5, M6and M7 each only with a factor of 0.2. Values for the total quantity MGless than 0 then correspond to a signal without music, which should bethen reproduced in the film mode, and values greater than 0 areclassified as a music signal, for which then the music mode should beused. The more negative or more positive this value, the moreunequivocal is the classification.

In order to avoid frequent switching in the limiting case, that is whenthe values of MG are near zero, a hysteresis is used. This means thatswitching from film mode to music mode will occur only when MG exceeds avalue greater than 0 (for example, 0.3). Switching from music mode tofilm mode occurs only when the value goes below a number less than 0−0.3).

The switching between film mode and music mode occurs with a delay andinertia that can be adjusted by the user. The signal type must beconstant, corresponding to the delay time, otherwise the reproductionmode will not be changed. Then, after this delay time, a cross-fadingoccurs between the modes with a time constant that corresponds to theinertia, as a result of which otherwise audible signal jumps can beavoided, and the transition from one mode to the other made can achievedwithout being noticeable. In the normal case, this time constant isabout 10 seconds. In the case of very short time constants, an attemptis made to make the change within a signal pause. In some cases, thedelay time pre-selected by the user as well as the time constant of theinertia should be reduced further, for example, directly after thechannel is switched in the case of a television set, and the audiosignal of the television set is reproduced. This case can be detectedsimply when the corresponding audio processing is applied in thetelevision set or if the television set sends a corresponding report tothe other connected equipment. Such a switching process can also berecognized by an abruptly occurring signal pause, which, within anequipment, during switching processes, will have a duration typical forthe equipment.

Furthermore, the detection of switching of channels is possible based onthe image signal, since usually the synchronization is lost duringswitching. It can also be concluded that a channel was changed when thesynchronization is lost. Upon detection of changing the channel, thedelay time is then set to 0, and the time constant is reduced to a timeof, for example, 3 seconds. After the first subsequent determination ofthe sound material, and a time period of corresponding length forcross-fading to the desired mode can then be changed again to the normaldelay time and the long time constant can be changed.

The delay time and the inertia are also altered as a function of theabsolute value of MG. Very high absolute values correspond to a veryclear classification, and therefore in such cases earlier switching ispossible.

Various sound programs can be used for the reproduction of musicsignals. For example, it is possible to output the difference signalbetween the left and right input signal onto the back loudspeaker,leaving the front channels uninfluenced. In addition, the differencesignals can be preprocessed individually for both channels, and usuallyall-pass filters are used for this purpose. In this way, decorrelationof the back loudspeaker is achieved. Alternatively, in the case of musicsignals, a sound program can be used which is frequently called “echo”.In this program, in addition to the different signal, an echo portion ofthe original signal, as well as of the difference signal is emitted fromall loudspeakers. It is common to all such sound programs suitable formusic signals that the stereo width is largely retained, that is, no oronly little signal is emitted from the front center loudspeaker, andalso that no active matrixing occurs, so that the level for the frontchannels is not reduced when the difference signal of the input channelsbecomes greater in comparison to their sum.

For signals other than music, for example, the Dolby Pro Logic or asimilar method is used. First of all, in this case, the level of thefront channels is reduced when the difference signal of the inputassumes a high level in comparison to the sum signal. If the differencesignal is very small, then the signals of the front, right, and leftchannels are retracked to the front central channel in order to achievea middle location of the speakers.

Instead of a 5-loudspeaker constellation, even more loudspeakers can beused so that then, for example, the difference signal is emitted fromthree back loudspeakers.

The invention will be explained below with the aid of a specificpractical example. The practical example shows a device according to theinvention. The device V according to the invention has a signal input E,a source information input Q as well as a signal output A. Audio dataare introduced to device V through input E. Especially, stereo audiodata, that is, audio data in a two-channel method are introduced. If thedata are introduced in analog form, then in a preconnected device,channel separation of the audio signal and digitization occurs. Thendigital data are introduced to device V. However, the device V isextended so that it can also process multichannel audio data, forexample in the AC3 format. Pure analog realization is also possible whenthe devices V8, V4, V5, V6 and V7 are realized through correspondinganalog variants using filter banks instead of the FFT or if theevaluation of these characteristics is omitted.

The audio signals which are introduced to device V through input E areintroduced at the same time to diverse other devices V1 to V10.

Devices V1 to V7 evaluate the input audio signal and also have anotherdevice VM1 to VM6 for mapping on a quantity. Here, the device VM1 servesfor mapping on quantity 1, and the device VM2 for mapping on quantity 2,etc.

Furthermore, device V1 serves for determination of the dynamics, deviceV2 for determination of the level, device V3 for the determination ofthe periodicity, device V4 for determination of frequency spectra,especially of musical instruments, device V5 serves for thedetermination of the flatness of the frequency curve of the audiosignal, device V6 for the determination of the number of maxima in thefrequency spectrum, device V7 for the determination of the amount ofsimilar spectral structures in the frequency spectrum, device V8 for thetransformation of the audio signals from the time region into thefrequency region, device V9 for processing of music signals, device V10for processing other signals, device V11 for the detection of switchingprocesses, and device V12 for mapping on a factor for controlling theswitching speed.

The quantities obtained from devices MV1 to MV7 are weighted withweighting factors G1 to G7 and added. The total quantity obtained inthis way is weighted again by devices V11 and V12 and passed through thehysteresis device H. The hysteresis device H prevents that switchingfrom film mode to music mode and vice versa occurs only when the totalquantity exceeds or goes below a predefined value. Then the totalquantity is introduced to an integrator I, which advantageously limitsto the region [−0.5 . . . 1.5] and to a device B for limiting to theregion [0 . . . 1.0].

The total quantity, which is passed through integrator I and device B,weighted with and added to audio signals, which originate from devicesV9 and V10. The corresponding audio processing mode is chosen in thisway.

LIST OF REFERENCE SYMBOLS

-   A Output (5 channel)-   B Device for limiting to region [0 . . . 1.0]-   G1, G2, G3, G4, G5, G6, G7 weighting factors-   H Hysteresis device-   I Integrator-   VM1 Device for mapping on quantity 1-   VM2 Device for mapping on quantity 2-   VM3 Device for mapping on quantity 3-   VM4 Device for mapping on quantity 4-   VM5 Device for mapping on quantity 5-   VM6 Device for mapping on quantity 6-   VM7 Device for mapping on quantity 7-   V1 Device for the determination of the dynamics-   V2 Device for level determination-   V3 Device for periodicity determination-   V4 Device for the determination of frequency spectra of musical    instruments-   V5 Device for the determination of the flatness of the frequency    curve-   V6 Device for the determination of the number of maxima in the    frequency spectrum-   V7 Device for the determination of the amount of similar spectral    structures in the frequency spectrum-   V8 Device for transformation in the frequency range-   V9 Device for processing of music signals-   V10 Device for processing of other signals-   V11 Device for detection of switching processes-   V12 Device for mapping on a factor for controlling the switching    speed

1-28. (canceled)
 29. Method for the selection of a sound algorithm forthe processing of an audio signal, wherein the audio signal is analyzedand, then, based on the analysis, the nature of the audio signal isdetermined, the audio signal is classified as a music signal or anothersignal and, depending on the classification, at least one of a pluralityof different sound algorithms are used for further processing and laterreproduction of the audio signal and, for the classification of theaudio signal, the method comprising: determining a plurality ofdifferent individual quantities (M1 to M6) from at least one of theaudio signal and the source of the audio signal (M7), weighting thedetermined quantities (M1 to M7) differently, and determining a totalquantity (MG) for the audio signal by classifying the audio signal andby weighted addition of the individual quantities (M1 to M7), andintroducing a hysteresis limit to the resulting quantity so as to avoidfrequent switching at a switching threshold when the fluctuations fromthe switching threshold are small.
 30. Method according to claim 29,wherein the audio signal is a stereophonic audio signal.
 31. Methodaccording to claim 29, wherein the audio signal comprises at least twoaudio channels.
 32. Method according to claim 29, wherein in the case ofa music signal, a sound program is chosen which contains the stereorange to the greatest possible extent or entirely.
 33. Method accordingto claim 29, wherein in the case of a music signal, a sound program ischosen which does not produce any reduction of level or produces only aslight reduction of level of two audio channels, which are frontchannels.
 34. Method according to claim 29, wherein in the case ofsignals other than music, a sound program is chosen which is compatiblewith Dolby Pro Logic.
 35. Method according claim 29, wherein dependingon the classification of the audio signal, the parameters to be adjustedfor music and film material are chosen automatically.
 36. Methodaccording claim 29, wherein the audio signal comprises at least threeaudio channels including a front center channel and front left and rightchannels, and wherein the switching of the front center channel to thefront left and right channels is performed and that the degree ofswitching is carried out individually.
 37. Method according to claim 29,wherein the dynamic range of the input signal and/or its level is usedas first quantity (M1) for the classification of the audio signal. 38.Method according to claim 29, wherein the periodicity of the audiosignal is used as second quantity (M2) for the classification of theaudio signal.
 39. Method according to claim 29, wherein the presence oftypical signal spectra in music is used as a third quantity (M3) for theclassification of the audio signal.
 40. Method according to claim 39,wherein the typical signal spectra of wind instruments or stringinstruments are recognized.
 41. Method according to claim 29, whereinthe flatness of the frequency curve of the audio signal is used as afourth quantity (M4) for the classification of the audio signal. 42.Method according to claim 29, wherein the number of maxima with acertain minimum level to be observed in the spectrum is used as a fifthquantity (M5) for the classification of the audio signal.
 43. Methodaccording to claim 29, wherein the presence of similar spectralstructures at different frequencies in a spectrum is used as the sixthquantity (M6) for the classification of the audio signal.
 44. Methodaccording to claim 29, wherein the nature of the source of the audiosignal is used as a seventh quantity (M7) for the classification of theaudio signal.
 45. Method according to claim 44, wherein the source ofthe audio signal is a CD, a DVD, a data file, a radio signal receiver,an audio radio signal receiver, a satellite radio signal receiver, acable radio signal receiver, a television transmission receiver. 46.Method according to claim 45, wherein the data file is an MP3 file. 47.Method according to claim 29, wherein switching to another soundalgorithm is performed only when the classification of the audio signalis constant for a time period, said time period being adjustable. 48.Method according to claim 29, wherein two sound algorithms can becross-faded into one another and the time for cross-fading can beadjusted by the user.
 49. Method according to claim 48, wherein theduration in which the classification of the audio signal is determinedand the time for cross-fading of a sound algorithm into another soundalgorithm is reduced as a function of the total quantity (MG) when thetotal quantity (MG) yields an unequivocal classification.
 50. Methodaccording to claim 48, wherein switching processes of the source signalsare recognized and in these cases the duration for the classification ofthe audio signal and the time for cross-fading of a sound algorithm intoanother sound algorithm are reduced.
 51. Method according to claim 50,wherein the switching processes are recognized by an abruptly occurringsignal pause.
 52. Method according to claim 50, wherein the switchingprocesses are recognized by a synchronization loss of an image signal.